ONE-BIT COMPRESSED SENSING BY LINEAR PROGRAMMING 



YANIV PLAN AND ROMAN VERSHYNIN 

Abstract. We give the first computationally tractable and almost optimal solution to the problem 
of one-bit compressed sensing, showing how to accurately recover an s-sparse vector x G R" from 
the signs of 0(s log 2 (n/s)) random linear measurements of x. The recovery is achieved by a simple 
linear program. This result extends to approximately sparse vectors x. Our result is universal in 
the sense that with high probability, one measurement scheme will successfully recover all sparse 
vectors simultaneously. The argument is based on solving an equivalent geometric problem on 
random hyperplane tessellations. 



1. Introduction 

Compressed sensing is a modern paradigm of data acquisition, which is having an impact on 
several disciplines, see [21]. The scientist has access to a measurement vector v G W 71 obtained as 

(1.1) v = Ax, 

where A is a given m x n measurement matrix and x 6 W 1 is an unknown signal that one needs to 
recover from v. One would like to take m <C n, rendering A non-invertible; the key ingredient to 
successful recovery of x is take into account its assumed structure - sparsity. Thus one assumes that 
x has at most s nonzero entries, although the support pattern is unknown. The strongest known 
results are for random measurement matrices A. In particular, if A has Gaussian i.i.d. entries, 
then we may take m = 0(slog(n/s)) and still recover x exactly with high probability [10, 7]; see 
[26] for an overview. Furthermore, this recovery may be achieved in polynomial time by solving 
the convex minimization program 

(1.2) minlja;'!^ subject to Ax' = v. 

Stability results are also available when noise is added to the problem [9, 8, 3, 27]. 

However, while the focus of compressed sensing is signal recovery with minimal information, the 
classical set-up (1.1), (1.2) assumes infinite bit precision of the measurements. This disaccord raises 
an important question: how many bits per measurement (i.e. per coordinate of v) are sufficient for 
tractable and accurate sparse recovery? This paper shows that one bit per measurement is enough. 

There are many applications where such severe quantization may be inherent or preferred - 
analog-to-digital conversion [20, 18], binomial regression in statistical modeling and threshold group 
testing [12], to name a few. 

1.1. Main results. This paper demonstrates that a simple modification of the convex program 
(1.2) is able to accurately estimate x from extremely quantized measurement vector 

y = sign(Ax). 
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Here y is the vector of signs of the coordinates of Ax. 1 

Note that y contains no information about the magnitude of x, and thus we can only hope to 
recover the normalized vector x/||x||2. This problem was introduced and first studied by Boufounos 
and Baraniuk [6] under the name of one-bit compressed sensing; some related work is summarized 
in Section 1.2. 

We shall show that the signal can be accurately recovered by solving the following convex mini- 
mization program 



(1.3) min x' L subject to sign(Ax') = y and 



m. 



The first constraint, sign(Ax') = y, keeps the solution consistent with the measurements. It is 
defined by the relation (ctj, x') ■ yi > for i = 1, 2, . . . , to, where is the i-th row of A. The second 
constraint, HAx'Hj = to, serves to prevent the program from returning a zero solution. Moreover, 
this constraint is linear as it can be represented as one linear equation Y^iLi Vi{ a i^ x ') = 771 where yi 
denote the coordinates of y. Therefore (1.3) is indeed a convex minimization program; furthermore 
one can easily represent it as a linear program, see (5.3) below. Note also that the number to 
in (1.3) is chosen for convenience of the analysis; it can be replaced by any other fixed positive 
number. 

Theorem 1.1 (Recovery from one-bit measurements). Let n,m,s > 0, and let A be an m x n 

random matrix with independent standard normal entries. Set 



(1.4) S = C(J^ log(2n/s) log(2n/m + 2m/n)j 



1/5 



Then, with probability at least 1 — C exp(— cdm), the following holds uniformly for all signals x 6 W 1 
satisfying 1 1 a? 1 1 ^ / \\x\\ 2 < y/s. Let y = sign(Ax). Then the solution x of the convex minimization 
program (1.3) satisfies 

x x 

< S. 
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Here and thereafter C and c denote positive absolute constants; other standard notation is 
explained in Section 1.3. 

Remark 1 (Effective sparsity). The Cauchy-Schwarz inequality implies that ||a;|| j / ||x|| 2 < -^/||x|| 
where ||x||o = |supp(x)| is the number of nonzero elements of x. Therefore one can view the 
parameter (||x|| 1 /||x|| 2 ) 2 asa measure of effective sparsity of the signal x. The effective sparsity is 
thus a real valued and robust extension of the sparsity parameter ||x||o, which allows one to handle 
approximately sparse vectors. 

Let us then state the partial case of Theorem 1.1 for sparse signals: 

Corollary 1.2 (Sparse recovery from one-bit measurements). Let n,m,s > 0, and set 5 as in 
(1.4). Then, with probability at least 1 — C exp(— cdm), the following holds uniformly for all signals 
x G R™ satisfying \\x\\o < s. Let y = sign(Ax). Then the solution x of the convex minimization 
program (1.3) satisfies 



x x 

X n X 



< 5. 
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To be precise, for a scalar z/Owe define sign(,z) 
a vector by acting individually on each element. 



— z/ \ z\, and sign(0) = 0. We allow the sign function to act on 
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Remark 2 (Number of measurements) . The conclusion of Corollary 1.2 can be stated in the following 
useful way. With high probability, an arbitrarily accurate estimation of every s-sparse vector x can 
be achieved from 

m = 0(slog 2 (n/s)) 

one-bit random measurements. The implicit factor in the O(-) notation depends only on the desired 
accuracy level 5; more precisely m ~ e)~ 5 s log 2 (n/s) up to an absolute constant factor. The same 
holds if x is only effectively s-sparse as in Theorem 1.1. The central point here is that the number 
of measurements is almost linear in the sparsity s, which can be much smaller than the ambient 
dimension n. 

Remark 3 (Non-gaussian measurements) . Most results in compressed sensing, and in random matrix 
theory in general, are valid not only for Gaussian random matrices but also for general random 
matrix ensembles. In one-bit compressed sensing, since the measurements sign(Aa:) do not depend 
on the scaling of the rows of A, it is clear that our results will not change if the rows of A are 
sampled independently from any rotationally invariant distribution in W 1 (for example, the uniform 
distribution on the unit Euclidean sphere S n ~ l ). 

However, in contrast to the widespread universality phenomenon, one-bit compressed sensing 
cannot be generalized to some of the simplest discrete distributions, such as Bernoulli. Indeed, 
suppose the entries of A are independent ±1 valued symmetric random variables. Then for the 
vectors x = (1, 0, 0, . . . , 0) and x' = (1, |, 0, . . . , 0) one can easily check that sign(Aa:) = sign(Aa/) 
for any number of measurements m. So one-bit measurements can not distinguish between two 
fixed distinct signals x and x' no matter how many measurements are taken. 

Remark 4 (Optimality) . For a fixed level of accuracy, our estimate on the number of measurements 
m = 0(s log 2 (n/s)) matches the best known number of measurements in the classical (not quan- 
tized) compressed sensing problem up to the exponent 2 of the logarithm, and up to an absolute 
constant factor. However, we believe that the exponent 2 can be reduced to 1. We also believe 
that the error S in Theorem 1.1 may decrease more quickly as s/m — > 0. In particular, Jacques 
et al. [18] demonstrate that x is exactly sparse and is estimated using an ^o-mhrimization-based 
approach, the error is upper bounded as 5 = (^((s/m) 1-0 ^ 1 ) log n); they also demonstrate a lower 
error bound 5 = £l(s/m) regardless of what algorithm is used. In fact, such a result is not possible 
when x is only known to be effectively sparse (i.e., ||a5||i/||sc||2 < V^)- Instead, the best possible 
bound is of the form 5 = 0(y/ (s/m) log(n/s)) (this can be checked via entropy arguments). We 
believe this is achievable (and is optimal) for the convex program (1.3). 

1.2. Prior work. While there have been several numerical results for quantized compressed sens- 
ing [6, 4, 5, 20, 28], as well as guarantees on the convergence of many of the algorithms used for 
these numerical results, theoretical accuracy guarantees have been much less developed. One may 
endeavor to circumvent this problem by considering quantization errors as a source of noise, thereby 
reducing the quantized compressed sensing problem to the noisy classical compressed sensing prob- 
lem. Further, in some cases the theory and algorithms of noisy compressed sensing may be adapted 
to this problem as in [28, 11, 17, 25]; the method of quantization may be specialized in order to 
minimize the recovery error. As noted in [19] if the range of the signal is unspecified, then such 
a noise source is unbounded, and so the classical theory does not apply. However, in the setup of 
our paper we may assume without loss of generality that ||aj|| 2 = 1, and thus it is possible that 
the methods of Candes and Tao [8] can be adapted to derive a version of Corollary 1.2 for a fixed 
sparse signal x. Nevertheless, we do not see any way to deduce by these methods a uniform result 
over all sparse signals x. 

In a complementary line of research Ardestanizadeh et al. [2] consider compressed sensing with a 
finite number of bits per measurement. However, the number of bits per measurement there is not 
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one (or constant); this number depends on the sparsity level s and the dynamic range of the signal 
x. Similarly, in the work of Gunturk et al. [14, 15] on sigma-delta quantization, the number of bits 
per measurement depends on the dynamic range of x. On the other hand, by considering sigma- 
delta quantization and multiple bits, the Gunturk et al. are able to provide excellent guarantees 
on the speed of decay of the error 5 as s/m decreases. 

The framework of one-bit compressed sensing was introduced by Boufounos and Baraniuk in [6]. 
Jacques et al. [18] show that O(slogn) one-bit measurements are sufficient to recover an s-sparse 
vector with arbitrary precision; their results are also robust to bit flips. In particular, their results 
require the estimate x to be as sparse as x, have unit norm, and be consistent with the data. The 
difficulty is that the first two of these constraints are non-convex, and thus the only known program 
which is known to return such an estimate is £q minimization with the unit norm constraint — this 
is generally considered to be intractable. Gupta et al. [16] demonstrate that one may tractably 
recover the support of cc from O(slogn) measurements. They give two measurement schemes. One 
is non-adaptive, but the number of measurements has a quadratic dependence on the dynamic 
range of the signal. The other has no such dependence but is adaptive. Our results settle several 
of these issues: (a) we make no assumption about the dynamic range of the signal, (b) the one-bit 
measurements are non-adaptive, and (c) the signal is recovered by a tractable algorithm (linear 
programming) . 

1.3. Notation and organization of the paper. Throughout the paper, C, c, Ci, etc. denote 
absolute constants whose values may change from line to line. For integer n, we denote [n] = 
{1, . . . , n}. Vectors are written in bold italics, e.g., x, and their coordinates are written in plain 
text so that the i-th component of x is X{. For a subset T C [n], xt is the vector x restricted to the 
elements indexed by T. The £\ and £2 norms of a vector x G W 1 are defined as ||a;||i = Y17=i \ Xi \ 
and || as || 2 = (SILi^f) 1 ^ 2 respectively. The number of non-zero coordinates of x is denoted by 
|| as || = I supp(sc)|. The unit balls with respect to £\ and £2 norms are denoted by B™ = {x £ W l : 
IMIi < 1} and 13% = {x £ W l : \\x\\2 < 1} respectively. The unit Euclidean sphere is denoted 
S n-i = { xe w i : \\x\\ 2 = 1}. 

The rest of the paper is devoted to proving Theorem 1.1. In Section 2 we reduce this task to the 
following two ingredients: (a) Theorem 2.3 which states states that a solution to (1.3) is effectively 
sparse, and (b) Theorem 2.2 which analyzes a simpler but non-convex version of (1.3) where the 
constraint || Ax'||i = m is replaced by || as' || 2 = 1. The latter result can be interpreted in a geometric 
way in terms of random hyperplane tessellations of a subset K of the Euclidean sphere, specifically 
for the set of effectively sparse signals K = S" 1-1 D \fsB™. In Section 3 we estimate the metric 
entropy of K, and we use this in Section 4 to prove our main geometric result of independent 
interest: m = 0(s\og(n/s)) random hyperplanes are enough to cut K into small pieces, yielding 
that all cells of the resulting tessellation have arbitrarily small diameter. This will complete part 
(b) above. For part (a), we prove Theorem 2.3 on the effective sparsity of solutions in Section 5. 
The proof is based on counting all possible solutions of (1.3), which are the vertices of the feasible 
polytope. This will allow us to use standard concentration inequalities from the Appendix and to 
conclude the argument by a union bound. 

Acknowledgement. The authors are grateful to Sinan Gunturk for pointing out an inaccuracy 
in the statement of Lemma 3.4 in an earlier version of this paper. 

2. Strategy of the proof 

Our proof of Theorem 1.1 has two main ingredients which we explain in this section. Throughout 
the paper, dj will denote the rows of A, which are i.i.d. standard normal vectors in W 1 . 
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Let us revisit the second constraint HAx'^ = m in the convex minimization program (1.3). 
Consider a fixed signal x' for the moment. Taking the expectation with respect to the random 
matrix A, we see that 

m 

E||Ae'||i = ^2E\(ai,x')\ = cm\\x'\\ 2 
i=i 

where c = \/2/-k. Here we used that the first absolute moment of the standard normal random 
variable equals c. So in expectation, the constraint HAx'Hj = m is equivalent to 1 1 a?' 1 1 2 = 1 up to 
constant factor c. 

This observation suggests that we may first try to analyze the simpler minimization program 

(2.1) minllx'Hj subject to sign(Ax') = y and ||x'|| 2 = 1. 

This optimization program was first proposed in [6]. Unfortunately, it is non-convex due to the 
constraint ||x'|| 2 = 1, and therefore seems to be computationally intractable. On the other hand, 
we find that the non-convex program (2.1) is more amenable to theoretical analysis than the convex 
program (1.3). 

The first ingredient of our theory will be to demonstrate that the non-convex optimization 
program (2.1) leads to accurate recovery of an effectively sparse signal x. One can reformulate 
this as a geometric problem about random hyperplane tessellations. We will discuss tessellations in 
Section 4; the main result of that section is Theorem 4.2 which immediately implies the following 
result: 

Theorem 2.1. Let n,m,s > 0, and set 

(2.2) 5 = C (— log(2n/s) V /S . 

\m J 

Then, with probability at least 1 — Cexp(— cSm), the following holds uniformly for all x,x £ W 1 
that satisfy \\x\\2 = ||x||2 = 1, ||x||i < y/s, \\x\\\ < y/s: 

sign(Ax) = sign(Aa?) implies ||x — x|| 2 < S. 

Theorem 2.1 yields a version of our main Theorem 1.1 for the non-convex program (2.1): 

Theorem 2.2 (Non-convex recovery). Let n,m,s > 0, and set 5 as in (2.1). Then, with prob- 
ability at least 1 — Cexp(— cdm), the following holds uniformly for all signals x G W l satisfying 
/ 11*112 ^ V^- Let y = sign(Aa;). Then the solution x of the non-convex minimization program 
(2.1) satisfies 



\\ x \\2 2 

Proof. We can assume without loss of generality that ||x|| 2 = 1 and thus Hx^ < \fs. Since x is 

feasible for the program (2.1), we also have Hx^ < Hx^ < y/s, and thus x G S 1 ™" 1 . Therefore 

Theorem 2.1 applies to x,x, and it yields that \\x — x|| 2 < 5 as required. □ 

Remark 5 (Prior work). A version of Theorem 2.1 was recently proved in [18] for exactly sparse 
signals x,x, i.e. such that || as || 2 = \\x\\2 = 1, ||x||o < s, ||x||o < s. This latter result holds with 
6 = C(s/m) 1 ~°^ log(2n). However, from the proof of Theorem 2.2 given above one sees that 
the result of [18] would not be sufficient to deduce our main results, even Corollary 1.2 for exactly 
sparse vectors. The reason is that our goal is to solve a tractable program that involves the l\ norm, 
and thus we cannot directly assume that our estimate will be in the low-dimensional set of exactly 
sparse vectors. Our proof of Theorem 2.1 has to overcome some additional difficulties compared 
to [18] caused by the absence of any control of the supports of the signals x,x. In particular, the 
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metric entropy of the set of unit-normed, sparse vectors only grows logarithmically with the inverse 
of the covering accuracy. This allows the consideration of a very fine cover in the proofs in [18]. 
In contrast, the metric entropy of the set of vectors satisfying ||x|| 2 < 1 and || cc || x < yfs is much 
larger at fine scales, thus necessitating a different strategy of proof. 

Theorem 1.1 would follow if we could demonstrate that the convex program (1.3) and the non- 
convex program (2.1) were equivalent. Rather than doing this explicitly, we shall prove that the 
solution x of the convex program (1.3) essentially preserves the effective sparsity of a signal x, and 
we finish off by applying Theorem 2.1. 

Theorem 2.3 (Preserving effective sparsity). Let n,s > and suppose that m > Cslog(n/s). 
Then, with probability at least 1 — C exp(— cm), the following holds uniformly for all signals x 
satisfying ||a;|| 1 / ||x|| 2 < \/s. Let y = sign(Aa;). Then the solution x of the convex minimization 
program (1.3) satisfies 

l ' X p < jrir • Cy/log{2n/m + 2m/n). 



\\ x \\2 \\ x \\2 

This result is the second main ingredient of our argument, and it will be proved in Section 5. 
Now we are ready to deduce Theorem 1.1. 

Proof of Theorem 1.1. Consider a signal x as in Theorem 1.1, so ||a;|| -|_ / ||x|| 2 < \fs. In view of 
the application of Theorem 2.3, we may assume without loss of generality that m > Cs\og(n/s). 
Indeed, otherwise we have 5 > 2 and the conclusion of Theorem 1.1 is trivial. So Theorem 2.3 
applies and gives 

jpqp < C\J s log(2n/m + 2m/ n) =: ^/sq. 



Also, as we noted above, \\x\\ 1 / \\x\\ 2 < yfs < t/sq. So Theorem 2.1 applies for the normalized 
vectors sc/||x||2, sc/||x||2 and for sq. Note that sign(Ax) = sign(Aa;) = y because x is a feasible 
vector for the program (1.3). Therefore Theorem 2.1 yields 



x x 



< 6 
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where 

S = c(^ log(2n/s)) 1/5 = C" log(2n/a) log(2n/m + 2m/n)j l ^ . 

This completes the proof. □ 

For the rest of the paper, our task will be to prove the two ingredients above - Theorem 2.1, 
which we shall relate to a more general hyperplane tessellation problem, and Theorem 2.3 on the 
effective sparsity of the solution. 

3. Geometry of signal sets 

Our arguments are based on the geometry of the set of effectively s-sparse signals 

K n)S := {x G R n : \\x\\ 2 < 1, < y/s} = B% n VsB? 

and the set of s-sparse signals 

S n , s := {x G R n : \\x\\ 2 < 1, ||x|| < a}. 

While the set 5„ jS is not convex, K H:S is, and moreover it is a convexification of S n ^ s in the following 
sense. Below, for a set K, we define conv(i^) to be its convex hull. 

Lemma 3.1 (Convexification). One has conv(S'„ iS ) C K n ^ s C 2conv(S , njS ). 
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Proof. The first containment follows by Cauchy-Schwartz inequality, which implies for each x G S n>s 
that HxIIj < \fs. The second containment is proved using a common technique in the compressed 
sensing literature. Let x G if n , s - Partition the support of x into disjoint subsets T±,T2, ... so that 
T\ indexes the largest s elements of x (in magnitude), i~2 indexes the next s largest elements, and 
so on. Since all xt { G S njS , in order to complete the proof it suffices to show that 

i>l 

To prove this, first note that ||a5Ti H2 — \\ x \\2 — 1- Second, note that for i > 2, each element of x^ 
is bounded in magnitude by \\xt { _ 1 || x /s, and thus HxtJ^ — H^Ti-i Hi /v 7 ^- Combining these two 
facts we obtain 

(3.1) H^lla < 1 + E llaTjIa^l + ^H^IIi/V^^l + Nli/V^^ 2, 

i>l i>2 i>2 

where in the last inequality we used that ||x||i < \fs for x G if„ iS . The proof is complete. □ 

Our arguments will rely on entropy bounds for the set if n , s . Consider a more general situation, 
where if is a bounded subset of W 1 and e > is a fixed number. A subset N C if is called an 
e-net of if if for every x G if one can find y £ M so that ||aj — j/ 1| 2 < £• The minimal cardinality 
of an e-net of if is called the covering number and denoted N(K,e). The number log N(K,e) is 
called the metric entropy of if. The covering numbers are (almost) increasing by inclusion: 

(3.2) if'Cif implies N(K', 2e) < N(K, e). 

Specializing to our sets of signals if n>s and S nj3 , we come across a useful example of an e-net: 

Lemma 3.2 (Sparse net). Let s < t. Then S n j l~l if njS is an y^sji-net of K nyS . 

Proof. Let x G if n ,s, and let T C [n] denote the set of the indices of the t largest coefficients of x 
(in magnitude). Using the decomposition x = xt + x^ and noting that xt G 5 n> t n if n ,s, we see 
that it suffices to check that ||a5T c ||2 < \fsjt. This will follow from the same steps as in (3.1). In 
particular, we have 

||XT C ||2 — II 33 111 / V^l — 

as required. □ 

Next we pass to quantitative entropy estimates. The entropy of the Euclidean ball can be 
estimated using a standard volume comparison argument, as follows (see [24, Lemma 4.16]): 

(3.3) N(B2,e)<(3/e) n , e € (0,1). 
From this we deduce a known bound on the entropy of S ntS '. 

Lemma 3.3 (Entropy of S ntS ). For e G (0, 1) and s < n, we have 

'9n\ 
esJ' 



logiV(5 n , s ,e) <slog(— i 



Proof. We represent S n ,s as the union of the unit Euclidean balls B2 n M. 1 in all s-dimensional 
coordinate subspaces, I C [n], \I\ = s. Each ball B r 2 l D M. 1 has an e-net for of cardinality at most 
(3/e) s , according to (3.3). The union of these nets forms an e-net of S n>s , and since the number of 
possible I is (^j), the resulting net has cardinality at most (^) (3/e)L s J < (3en/es) s . Taking the 
logarithm completes the proof. □ 

As a consequence, we obtain an entropy bound for if n>s : 

7 



Lemma 3.4 (Entropy of K n>s ). For e G (0, 1), we have 



y n,s, )-\4_s lQg ^ l f2^l<e<\ 



Cs (2n\ 



Proof. First note that K n)S C -BJ • Then the monotonicity property (3.2) followed by the volumetric 
estimate (3.3) yield the first desired bound N(K n , s ,e) < N(B%,e/2) < nlog(6/e) for all e £ (0, 1). 

Next, suppose that 2y^ < e < 1. Then set t := 4s/e 2 < n. Lemma 3.2 states that S n: tC]K njS is 
an (e/2)-net of K n<s . Furthermore, to find an (e/2)-net of S n j, we use Lemma 3.3 for e/4 and for 
t. Taking into account the monotonicity property (3.2), we see that there exists an (e/2)-net N of 
S n j H K n ^ s and such that 

, . n , /36n\ 4s , /9en\ 
log|AA|<tlog(— )=^log(— )• 

It follows that M is an e-net of K UjS , and its cardinality is as required. □ 



4. Random hyperplane tessellations 

In this section we prove a generalization of Theorem 2.1. We consider a set K C W l and a collec- 
tion of m random hyperplanes in W 1 , chosen independently and uniformly from the Haar measure. 
The resulting partition of K by this collection of hyperplanes is called a random tessellation of K. 
The cells of the tessellation are formed by intersection of K and the m random half-spaces with 
particular orientations. The main interest in the theory of random tessellations is the typical shape 
of the cells. 




Figure 1. Hyperplane tessellation of a subset K of a sphere 



We shall study the situation where K is a subset of the sphere S n ~ , see Figure 1. The particular 
example of K = S 71 ^ 1 is a natural model of random hyperplane tessellation in the sperical space 
S 11-1 . The more classical and well studied model of random hyperplane tessellation is in Euclidean 
space M n , where the hyperlanes are allowed to be affine, see [23] for the history of this field. The 
random hyperplane tessellations of the sphere is studied in particular in [22]. 

Here we focus on the following question. How many random hyperplanes ensure that all the cells 
of the tessellation of K have small diameter (such as 1/2)? For the purposes of this paper, we shall 
address this problem for a specific set, namely for 

k = s n ~ l n ^sB'1 = s 71 - 1 n K n , s . 

We shall prove that m = 0(slog(n/s)) hyperplanes suffice with high probability. Our argument 
can be extended to more general sets K, but we defer generalizations to a later paper. 
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Theorem 4.1 (Random hyperplane tessellations). Let s < n and m be positive integers. Consider 
the tessellation of the set K = S n ~ l n y/sB™ by m random hyperplanes in W 1 chosen independently 
and uniformly from the Haar measure. Let 8 G (0, 1), and assume that 

m > C8~ 5 slog(2n/s). 

Then, with probability at least 1 — 2exp(— 8m), all cells of the tessellation of K have diameter at 
most 8. 

It is convenient to represent the random hyperplanes in Theorem 4.1 as (aj)- 1 , i = l,...,m, 
where a L are i.i.d. standard normal vectors in R n . The claim that all cells of the tessellation of 
K have diameter at most 8 can be restated in the following way. Every pair of points x,y £ K 
satisfying \\x — y ||2 > 8 is separated by at least one of the hyperplanes, so there exists i G [m] such 
that 

(oj,x) > 0, (ai,y) < 0. 
Theorem 4.1 is then a direct consequence of the following slightly stronger result. 

Theorem 4.2 (Separation by a set of hyperplanes). Let s < n and m be positive integers. Consider 
the set K = S 71 ^ 1 n y/sB™ and independent random vectors a±, . . . ,a m ~ A/"(0,Id) in W 1 . Let 
8 G (0, 1), and assume that 

m > C8~ 5 slog(2n/s). 

Then, with probability at least 1 — 2 exp(— 8m), the following holds. For every pair of points x,y G K 
satisfying \\x — y\\2 > 5, there is a set of at least c5m of the indices i G [m] that satisfy 

(a,i,x) > cS, (a,i,y) < -cS. 

We will prove Theorem 4.2 by the following covering argument, which will allow us to uniformly 
handle all pairs x, y G K satisfying \\x — y\\2 > 8. We choose an e-net N £ of K as in Lemma 3.4. 
We decompose the vector x = xq + x' where xq G M £ is a "center" and x' G sB^ n K is a "tail" , 
and we do similarly for y. An elementary probabilistic argument and a union bound will allow us 
to nicely separate each pair of centers xo,yo G N £ satisfying ||xo — J/olh > <5 by Jl(m) hyperplanes. 
(Specifically, it will follow that (aj, xq) > c8, (oj, yo) < —c8 for at least cSm of the indices i G [m].) 

Furthermore, the tails x' , y' G eB^ H \fsB™ can be uniformly controlled using Lemma 5.4, which 
implies that all tails are in a good position with respect torn - o{m) hyperplanes. (Specifically, for 
small e one can deduce that |(oj,x')| < cS/2, |(oj,j/')| < c8/2 for at least m — c8m/2 of the indices 
i G [m].) Putting the centers and the tails together, we shall conclude that x and y are separated 
at least tt(m) + m — o{m) > f2(m) hyperplanes, as required. 

Now we present the full proof of Theorem 4.2. 

4.1. Step 1: Decomposition into centers and tails. Let e G (0,1) be a number to be de- 
termined later. Let M e be an e-net of K. Since K C K nyS , Lemma 3.4 along with monotonicity 
property of entropy (3.2) guarantee that N £ can be chosen so that 

(4.1) log|A4|<^log(^). 

Lemma 4.3 (Decomposition into centers and tails). Let t = As/e 2 . Then every vector x G K can 
be represented as 

(4.2) x = xq + ex' 
where xq G J\f £ , x' G K n:t . 
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Proof. Since M £ is an e-net of K, representation (4.2) holds for some x' G B'^. Since K n % = B'2 D 
y/iBf, it remains to check that x' G . Note that a; G if C ^fsB^ and x e M £ C K C s/sB?. 

By the triangle inequality this implies that ex' = x — xq G 2^/sB™. Thus a;' G (2y / s/e)Sf = y/iB™, 
as claimed. □ 

4.2. Step 2: Separation of the centers. Our next task is to separate the centers Xq, yo of each 
pair of points x,y G K that are far apart by f2(m) hyperplanes. For a fixed pair of points and for 
one hyperplane, it is easy to estimate the probability of a nice separation. 

Lemma 4.4 (Separation by one hyperplane). Let x,y G S 1 ™ -1 and assume that \\x — j/ 1 1 2 > <5 /or 
some J > 0. Xei a ~ A/"(0, Id) . Then for 5o = 5/12 we have 

P{(a,x) > S , <o,y) < -5 } > <5 . 

Proof. Note that 

P{(a,x) > <5 , (a,y) < -<5 } = P{(a,x) > and (a, y) < and (a,x) ^ (0,5 ] and (a,y) ^ [-<5 ,0)} 

> 1-P{(a,aj) <0or (a, y) > 0} - P {(a, x) G (0, <5 ]} - P {(a, y) G [-<Jo,0)}. 
The inequality above follows by the union bound. Now, since (a, x) ~ A/"(0, 1) we have 

F{(a,x) G (0,<5 ]} < -|= and P {(a, y) G [-oq, 0)} < V ' 



/2vr v 27r 

Also, denoting the geodesic distance in S"™ -1 by d(-, ■) it is not hard to show that 

F{{a,x) < Oor (a,y) > 0} = 1 - < 1 - < 1 - A 

(see [13, Lemma 3.2]). Thus 

P{(a,x) > S , (a,y) < -5 } > - ^= > 5 

as claimed. □ 

Now we will pay attention to the number of hyperplanes that nicely separate a given pair of 
points. 

Definition 4.5 (Separating set). Let So G (0,1). The separating index set of a pair of points 
x, y G S" 1-1 is defined as 

h {x, y) := {i G [m] : (a i: x) > 5 , (oj, y) < -o~ } . 

The cardinality |i,5 (x, y)| is a binomial random variable, which is the sum of m indicator func- 
tions of the independent events {(aj,x) > <5o, (a,i,y) < — So}. The probability of each such event 
can be estimated using Lemma 4.4. Indeed, suppose ||x — y\\i > S for some 5 > 0, and let So = 5/12. 
Then the probability of each of the events above is at least Sq- Then |i,$ (a5,y)| ~ Binomial ( m, p) 
with p > Sq- A standard deviation inequality (e.g. [1, Theorem A. 1.13]) yields 

(4.3) P{|/ 5o (x,y)| < Som/2} < e~ 5 ° m / 8 . 

Now we take a union bound over pairs of centers in the net M £ that was chosen in the beginning 
of Section 4.1. 

Lemma 4.6 (Separation of the centers). Let e,5 G (0,1), and let N e be an e-net of K whose 
cardinality satisfies (4.1). Assume that 



(4.4) m>^|log( 



( ' a 1 (2n^ 
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Then, with probability at least 1 — exp(— Sm/WO), the following event holds: 

(4.5) For every £Co,2/o £ ■N'e suc ^ that \\xq — yo\\2 > 8, one has \I S / 12 (xo,yo)\ > Sm/24. 

Proof. For a fixed pair xo,yo as above, we can rewrite (4.3) as 

P {\h/i2(xo, Vo)| < Sm/24} < e ~ 5m / 96 . 
A union bound over all pairs xo,yo implies that the event in (4.5) fails with probability at most 

\M £ \ 2 ■ e -<W96 

By (4.1) and (4.4), this quantity is further bounded by 



exp 



Cs , {2n\ 5mi , „ 

y log ( — ) - — < exp(-5m/100) 



V s J 96 

provided the absolute constant C\ is chosen sufficiently large. The proof is complete. □ 

4.3. Step 3: Control of the tails. Now we provide a uniform control of the tails x' £ if n ,i 
that arise from the decomposition given in Lemma 4.3. The next result is a direct consequence of 
Lemma 5.4. 

Lemma 4.7 (Control of the tails). Let t > I and let oi, . . . , a m ~ AA(0, Id) be independent random 
vectors in W 1 . Assume that 

(4.6) m > Ctlog(2n/t). 

T/ien, w«i/t probability at least 1 — 2exp(— cm), i/te following event holds: 

sup — V ka^aj')! < 1. 

4.4. Step 4: Putting the centers and tails together. Let e = cq6 2 for a sufficiently small 
absolute constant cq > 0. To control the tails, we choose an e-net N £ of K as in Lemma 4.6, and 
we shall apply this lemma with 8/2 instead of 5. Note that requirement (4.4) becomes 

m> C 2 r 5 slog(^), 

and it is satisfied by the assumption of Theorem 4.2, for a sufficiently large absolute constant C. 
So Lemma 4.6 yields that with probability at least 1 — exp(— 5m/200), the following separation of 
centers holds: 

(4.7) For every #0,2/0 £ A4 such that ||a3o — yo\\2 > 5/2, one has \Is/24( x o, yo)\ > Sm/48. 

To control the tails, we choose t = 4s/e 2 ~ s/S 4 as in Decomposition Lemma 4.3, and we shall 
apply Lemma 4.7. Note that requirement (4.6) becomes 

m > Cart log (^p), 

and it is satisfied by the assumption of Theorem 4.2, for a sufficiently large absolute constant C. 
So Lemma 4.7 yields that with probability at least 1 — 2exp(— cm), the following control of tails 
holds: 

^ m 

(4.8) For every x' £ K Ujt , one has — ^ \ {a>i, x')\ < 1- 



m 
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Now we combine the centers and the tails. With probability at least 1 — 2 exp(— c5m) , both events 
(4.7) and (4.8) hold. Suppose both these events indeed hold, and consider a pair of vectors x,y £ K 
as in the assumption, so \\x — y\\2 > 5. We decompose these vectors according to Lemma 4.3: 

(4.9) x = x + ex', y = y + ey 

where Xq, yo S N £ and x',y' £ K n $. By the triangle inequality and the choice of e, the centers are 
far apart: 

ll^o - 2/o II 2 > ||« - y\\i -2e>5-2e = S- 2c 5 2 > 5/2. 
Then event (4.7) implies that the separating set 

(4.10) I := Is/24,(x,y) satisfies \I \ > 5m/48. 
Furthermore, using (4.8) for the tails x' and y' we see that 



1 m ^ m 

-J2\(a l ,x')\ + -J2\(a i ,y')\<2. 

i=l i=l 

By Markov's inequality, the set 

\(n,.x'\\ 4- \ln,.iA\ < — I satisfies \(T'Y\ < 

96 



I' := lie [m] : \(a h x')\ + \(a h y')\ < ^ I satisfies |(/') c | < ° " 



We claim that 

/ := I n /' 

is a set of indices i that satisfies the conclusion of Theorem 4.2. Indeed, the number of indices in 
/ is as required since 



5m 5m 5m 



l/l > \h\-\(I'Y\ > 



48 96 96 

Further, let us fix i £ I. Using decomposition (4.9) we can write 

(Oj, x) = (a,i, x ) + e{a,i, x'). 

Since i £ I C I = Is/2i{ x i y), we nave ( a i, x o) > S/2A, while from i £ I C /' we obtain 
(oi,aj') > -192/5. Thus 

«J 192e A 

(a,-,£c) > > — , 

v ' 7 24 (5 ~ 30' 

where the last estimate follows by the choice of e = cq5 2 for a sufficiently small absolute constant 

co > 0. In a similar way one can show that 

5 192e 5 

(a,-,«> < 1 < . 

\ %,vi 24 <5 ~ 30 

This completes the proof of Theorem 4.2. □ 

5. Effective sparsity of solutions 

In this section we prove Theorem 2.3 about the effective sparsity of the solution of the convex 
optimization problem (1.3). Our proof consists of two steps - a lower bound for ||jc||2 proved in 
Lemma 5.1 below, and an upper bound on \\x\\2 which we can deduce from Lemma 5.4 in the 
Appendix. 

Lemma 5.1 (Euclidean norm of solutions). Let n,m > 0. Then, with probability at least 1 — 
Cexp(— cmlog(2n/m + 2m/n)), the following holds uniformly for all signals x £ W 1 . Let y = 
sign(Aa:). Then the solution x of the convex minimization program (1.3) satisfies 

\\x\\ 2 > c/ y/log(2n/m + 2m/n). 
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Remark 6. Note that the sparsity of the signal x plays no role in Lemma 5.1; the result holds 
uniformly for all signals x. 



Let us assume that Lemma 5.1 is true for a moment, and show how together with Lemma 5.4 it 
implies Theorem 2.3. 

Proof of Theorem 2.3. With probability at least 1— C exp(— cm), the conclusions of both Lemma 5.1 
and Lemma 5.4 with t = 1/4 hold. Assume this event occurs. Consider a signal x as in Theorem 2.3 
and the corresponding solution x of (1.3). By Lemma 5.1, the latter satisfies 

(5.1) || £ || 2 > c/ y/log(2n/m + 2m/n). 
Next, consider 

1 m 1 

A = — V |(oj,x)| = — \\Ax\\!. 
m m 
i=i 

Since by the assumption on x we have x/||x||2 £ K n ^ s n S" 1-1 , Lemma 5.4 with t = 1/4 implies that 

(5.2) A>MK 

By definition of A, the vector \~ 1 x is feasible for the program (1.3), so the solution x of this 
program satisfies 

||£||i < ll^~ la; l|i = A 1 1 1 a? 1 1 1 . 
Putting this together with (5.2) and (5.1), we conclude that 

H 1 < Yjh~ < 2 } X \h\\ ^ |nr ' Cy/\og(2n/m + 2m/n). 
\\ x \\2 * \\ x \\2 \\ x \\2 ll^lb II 35 II2 

This completes the proof of Theorem 2.3. □ 

In the rest of this section we prove Lemma 5.1. The argument based on the observation that the 
set of possible solutions x of the convex program (1.3) for all x and corresponding y is finite, and 
its cardinality can be bounded by exp(Cmlog(2n/m + 2m/n)). For each fixed solution x, a lower 
bound on ||x||2 will be deduced from Gaussian concentration inequalities, and the argument will 
be finished by taking a union bound over x. 

It may be convenient to recast the convex minimization program (1.3) as a linear program by 
introducing the dummy variables u = (u±, 112, ■ ■ ■ , u n ): 



(5.3) min^^Uj such that: 



=1 



-Ui < x\ < Ui, i = 1,2, ...,n; 

Vi(ai,x') > 0, i = l,2,...,m; 

UEI=i^(a^')>l- 

The feasible set of the linear program (5.3) is a polytope in R 2n , and the linear program attains 
a solution on a vertex of this polytope. Further, since the random Gaussian vectors a, are in 
general position, one can check that the solution of the linear program is unique with probability 
1. Thus, by characterizing these vertices and pointing out the relationship between ui and Xj, we 
may reduce the space of possible solutions x. This is the content of our next lemma. Given subsets 
T C {1, 2, . . . , n}, Q C {1, 2, ... , m}, we define to be the submatrix of A with columns indexed 
by T and rows indexed by Q. 

Lemma 5.2 (Vertices of the feasible polytope). With probability 1, the linear program (5.3) attains 
a solution (x,u) at a point which satisfies the following for some T C {1,2, ...,n} and £1 C 
{l,2,...,m}: 
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(1) Ui = \xi\; 

(2) supp(£) = T; 

(3) \T\ = + 1; 

(4) A%x T = 0; 

(5) ±T£ =1 \{a i ,£)\ = l. 

Proof. Part (1) follows since we are minimizing J2 u i- P&rt (5) follows since 

j m j m 

— y]j/*<oi,*) = — y]Ka»>*)i 

i=i i=i 

combined with the fact that we are implicitly minimizing 1 1 ae r 1 1 1 - Parts (2)— (4) will follow from the 
fact that (5.3) achieves its minimum at a vertex. The vertices are precisely the feasible points for 
which some d of the inequality constraints achieve equality, provided x is the unique solution to 
those d equalities. Since (sc, u) G M 2n at least 2n of the constraints must be equalities. We now 
count equalities based on T and fi. 

We first consider the constraints —U{ < x\ < Ui, i = 1, 2, . . . , n. If Xi = we have two equalities, 
— m = Xi and Ui = Xi, otherwise, we have one. This gives n + \T C \ equalities. Part (5) gives one 
more equality. This leaves us with at least 2n — n — \T C \ — 1 = \T\ — 1 equalities that must be 
satisfied out of the equations yi(ai, x) > 0. Thus, we may take = \T\ — 1. □ 

Proof of Lemma 5.1. We may disregard the dummy variables (ui) and consider that the solution 
x = x' must satisfy conditions (2)-(5) above for some T and Q. We will show that with high 
probability, any such vector x' 6 W 1 is lower bounded in the Euclidean norm. 

Let us first fix sets T and fi, and consider a vector x' satisfying (2)-(5). We represent it as 

x' = \ix for some fi > and ||jc||2 = 1. 

Our goal is to lower bound \x. By condition (4) above, we have A^ xt = which, with probability 
1, completely determines the vector x up to multiplication by ±1 (since |T| = + 1 and xj~ c = 0). 
Moreover, since supp(«) = supp(aj') = T, we have = A^ xt = A n x, so (a^x 1 ) = for i G Q. 
Using this with together with condition (5), we obtain 



1 m 1 

1 = n — J2\{ai,x)\ =n — y2\(a,i,x)\ 

1=1 ifiQ 

and thus 

(5.4) lh1l 2 = M=(^El( a ^)l) _1 - 

We proceed to upper bound ^ Ylign |(oj,x)|. 

Since the random vector x depends entirely on Ap, it is independent of en for i ^ Q. Thus, by 
the rotational invariance of the Gaussian distribution, for any fixed vector z with unit norm, we 
have the following distributional estimates: 2 

1 " dist 1 , / u d l st 1 



\(ai,x)\ = — J2\{ai,x)\ < — 



aisr 

z For random variables X, Y, the distributional inequality X < Y means that ¥{X > t} < ¥{Y > t} for all t £ E. 
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The last term is a sum of independent sub-Gaussian random variables, and it can be bounded 
using standard concentration inequalities. Specifically, applying Lemma 5.3 from the Appendix, 
we obtain 



( 1 m ^1 

< — ^ \{a>i,z)\ > t > < Cexp(-cmt 2 



for t > 2. 



Using (5.4), this is equivalent to 

P{||aj'|| 2 < l/t] < Cexp(-cmt 2 ) for t > 2. 

It is left to upper bound the number of vectors satisfying conditions (2)-(5) and to use the union 
bound. Since \T\ = \Q\ + 1, the total number of possible choices for T and Q, (and hence of x') is 

min(m,n— 1) 



E 

i=0 



n 
i + l 



< exp(Cm log(2n/m + 2m/n)). 



Thus, by picking t = Coy / log(2n/m + 2m/n) with a sufficiently large absolute constant Co, we find 
that all x' uniformly satisfy the required estimate 1 1 as' 1 1 2 > c/ y / log(2n/m + 2m/n) with probability 
at least 1 — exp(Cmlog(2n/m + 2m/n)) • Cexp(— cmt 2 ) = 1 — Cexp(— cm log(2n/m + 2m/n)). 
Lemma 5.1 is proved. □ 

Appendix. Uniform concentration inequality. 
In this section we prove concentration inequalities for 



\ Ax h = ^2 \( a u 



X 



i=l 



In the situation where the vector x is fixed, we have a sum of independent random variables, which 
can be controlled by standard concentration inequalities: 

Lemma 5.3 (Concentration). Let n, m G N and x G W 1 . Then, for every t>0 one has 



{|^i>.*>i-v!i 





> t\\ x h^i 


f-\\xh 







Proof. Without loss of generality we can assume that 1 1 as 1 1 2 = 1- Then (aj,£c) are independent 
standard normal random variables, so E|(oj,x)| = \f2/ir. Therefore Xj := \(ai,x)\ — \/2/it are 
independent and identically distributed centered random variables. Moreover, Xj are sub-gaussian 
random variable with ||Xj||^ 2 < C, see [26, Remark 18]. An application of Hoeffding-type inequality 
(see [26, Proposition 10]) yields 



1 m 

-£x 



> t > < Cexp(-cmt 2 ). 



This completes the proof. 



□ 



We will now prove a stronger version of Lemma 5.3 that is uniform over all effectively sparse 
signals x. 

Lemma 5.4 (Uniform concentration). Let n G N, t 6 [0, y/2/ir], and suppose thatm > Ct~ 4 slog(2n/s). 
Then 

l 2 



sup 



> t > < Cexp(-cmr). 
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Proof. This is a standard covering argument, although the approximation step requires a little extra 
care. Let M be a t/4-net of K U:S n S" 1-1 . Since K n ^ s n S" 1 " 1 C l<f n)S , we can arrange by Lemma 3.4 
that 

\M\ < exp(Ct- 2 slog(2n/s)). 

By definition, for any x G AT njS D S*™ -1 one can find x € M such that \\x — x\\ 2 < t/4. So the 
triangle inequality yields 



1 171 l~2 1 m p2 1 m 

m V vr m V vr m 

i=l i=l i=l 



cc — a;) 



Note that \\x — x\\ Y < + Wx^ < 2^/s. Together with \\x — x\\ 2 < t/4 this means that 

t 
4 

Consequently, 
(5.5) 



X - X G T • -^n,64s/t2 • 



sup 



m ^— ' 



x 





< sup 


1 










m 



^El(a^)|-yf 



+ - • sup — > KOi,u>) 



m ^ — ' 



=: i?i + - ■ i?2- 

We bound the terms R\ and i?2 separately. For simplicity of notation, we assume that 64s /t 2 is an 
integer, as the non- integer case will have no significant effect on the result. 

A bound on Ri follows from the concentration estimate in Lemma 5.3 and a union bound: 

(5.6) P{i?i > t/4} < \M\ ■ Cexp(-cmt 2 ) < Cexp(Crt log(2ra/s) - cmt 2 ) < Cexp(-cmt 2 ) 

provided that m > Ct~ A s\og{2n/ s). 

Next, due to Lemma 3.1 and Jensen's inequality, we have 

\ m \ m 1/2 

R 2 <2 sup — J y\{a i ,w)\<2 sup (— V^,™) 2 ) =: 2R' 2 . 

The quantity R 2 has been well studied in compressed sensing; it is bounded by the restricted 
isometry constant of the matrix ^7=^4 at sparsity level 64s/i 2 . Probabilistic bounds for the restricted 
isometry constants of Gaussian matrices are well known, and have been derived in the earliest 
compressed sensing works [10]. We use the bound in [26, Theorem 65] that gives 

(5.7) P {R 2 > 1.5} < 2 exp(-cm) 
provided that m > Ct~ 2 slog(n/ s). Thus 

P{i? 2 > 3} < 2exp(-cm). 
Combining this and (5.6) we conclude that 



+ ^ • R 2 > t j < C"exp(-cmt 2 ) 



where we used the assumption that t < y/2/ir. This and (5.5) complete the proof. 
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